Advanced Statistical Inference II

Chelsea Parlett-Pelleriti

Review

Bayesian Regression

Most of what we need for a Bayesian regression is the same as for a Frequentist regression, except priors

  • Uninformative
  • Weakly Informative (Regularizing)
  • Strongly Informative

❓How can we check whether our priors are appropriate?

Predictive Checks

Predictive Distributions

🤔 How do we use distributions of parameters (prior or posterior) to make predictions?

Predictive Distributions

Two sources of variability:

  1. Distributional (Prior or Posterior) Variability

  2. Sampling (Data) Variability

Predictive Distributions

Distributional Variability

How certain are we about the value of the parameters?

Predictive Distributions

Sampling Variability

How much will observed data typically vary from its expected value? a.k.a. inherent variability in the data around the true value.

Predictive Distributions

\[ p(\tilde{x} \mid x) \]

Probability of new sample \(\tilde{x}\) given the current data \(x\).

  1. draw \(\theta_i \sim \pi(\theta)\) (one sample of all the model parameters)

  2. draw \(y_i \sim p(y_i \mid \theta_i)\) (sample of data from model implied by \(\theta_i\))

1 accounts for uncertainty about the parameters \(\theta\), 2 accounts for uncertainty in the sample

Predictive Distributions

Two sources of variability:

  1. Distributional (Prior or Posterior) Variability

  2. Sampling (Data) Variability

Prior Predictive Check

Prior predictive checks allow us to simulate samples from our prior and make sure it lines up with our expectations about reality.

# bayes
priors <- c(
  prior("normal(0,5)", class = "b"),
  prior("normal(0,4)", class = "Intercept"),
  prior("gamma(0.1, 10)", class = "sigma")
)
mod_0 <- brm(y ~ x, data = df, prior = priors, sample_prior = "only") 
mod_0 |> pp_check()

Prior Predictive Check

Let’s adjust our priors to be more realistic.

# bayes
priors <- c(
  prior("normal(0,1)", class = "b"),
  prior("normal(0,1)", class = "Intercept"),
  prior("gamma(1, 1)", class = "sigma")
)
mod_1 <- brm(y ~ x, data = df, prior = priors, sample_prior = "only")
mod_1 |> pp_check()

Posterior Predictive Check

Posterior predictive checks do something similar, but simulates new data based on the posterior. This allows us to “look for systematic discrepancies between real and simulated data” (Gelman et al. 2004) which tells us a little about model fit.

# bayes
priors <- c(
  prior("normal(0,1)", class = "b"),
  prior("normal(0,1)", class = "Intercept"),
  prior("gamma(1, 1)", class = "sigma")
)
m1 <- brm(y ~ x, data = df, prior = priors) 
m1 |> pp_check()

Posterior P-Values

Remember, p-values tell you:

given a distribution \(\pi(\theta)\), what is the probability of observing a \(\theta\) as or more extreme as the one we did? In other words, how consistent is \(\theta\) with \(\pi(\theta)\)?

# generate samples from posterior
yrep <- posterior_predict(m1, draws = 500)

# overall mean function
overall_mu <- function(x) mean(x)

# calc for actual data
overall_mu(df$y) 
[1] 0.236782
# calc for posterior samples
ppc_stat(df$y, yrep, stat = "overall_mu", binwidth = 0.005)

Bayesian Regression in R

mtcars Data

library(tidyverse)
data(mtcars)
glimpse(mtcars)
Rows: 32
Columns: 11
$ mpg  <dbl> 21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8,…
$ cyl  <dbl> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4, 8,…
$ disp <dbl> 160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 16…
$ hp   <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, 180…
$ drat <dbl> 3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,…
$ wt   <dbl> 2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.…
$ qsec <dbl> 16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18…
$ vs   <dbl> 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0,…
$ am   <dbl> 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0,…
$ gear <dbl> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3,…
$ carb <dbl> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2,…

General Model Setup

mod1 <- brm(mpg ~ hp + gear, data = mtcars) 
get_prior(mod1)
                   prior     class coef group resp dpar nlpar lb ub
                  (flat)         b                                 
                  (flat)         b gear                            
                  (flat)         b   hp                            
 student_t(3, 19.2, 5.4) Intercept                                 
    student_t(3, 0, 5.4)     sigma                             0   
       source
      default
 (vectorized)
 (vectorized)
      default
      default

General Model Setup

priors <- c(
  prior("normal(0,5)", class = "b")
)

mod2 <- brm(mpg ~ hp + gear, data = mtcars,
            prior = priors) 
get_prior(mod2)
                   prior     class coef group resp dpar nlpar lb ub
             normal(0,5)         b                                 
             normal(0,5)         b gear                            
             normal(0,5)         b   hp                            
 student_t(3, 19.2, 5.4) Intercept                                 
    student_t(3, 0, 5.4)     sigma                             0   
       source
         user
 (vectorized)
 (vectorized)
      default
      default

Prior Predictive Check

mod2_pp <- brm(mpg ~ hp + gear, data = mtcars,
            prior = priors, sample_prior = "only") 
mod2_pp |> pp_check()

Summary

summary(mod2)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: mpg ~ hp + gear 
   Data: mtcars (Number of observations: 32) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    18.02      3.32    11.49    24.57 1.00     4135     3048
hp           -0.06      0.01    -0.08    -0.05 1.00     4546     2822
gear          3.11      0.77     1.55     4.64 1.00     4361     2848

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     3.22      0.44     2.48     4.21 1.00     3854     3258

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Convergence

library(bayesplot)
summary(mod2)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: mpg ~ hp + gear 
   Data: mtcars (Number of observations: 32) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    18.02      3.32    11.49    24.57 1.00     4135     3048
hp           -0.06      0.01    -0.08    -0.05 1.00     4546     2822
gear          3.11      0.77     1.55     4.64 1.00     4361     2848

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     3.22      0.44     2.48     4.21 1.00     3854     3258

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
mcmc_trace(mod2)

Posterior Predictive Check

mod2 |> pp_check()

Posterior Examination

mcmc_areas(mod2,
           area_method = "scaled height",
           pars = c("b_hp",
                    "b_gear")) + 
  geom_vline(xintercept = c(-0.1,0.1),
             color = "red",
             linetype = "dashed") # ROPE

(more on ROPE with a brms model here)

Bayesian Logistic Regression in R

# s/o to Chat GPT for helping simulat the data
# libs
library(brms)
library(bayesplot)
library(tidybayes)

# sim data
set.seed(540)
n <- 200
parental_income <- rnorm(n, mean = 50, sd = 10) # income_in_k
z <- 1 + 0.05 * parental_income + rnorm(n, 0, 1)
p <- 1 / (1 + exp(-z))
passed_exam <- rbinom(n, 1, p) 
df <- data.frame(parental_income, passed_exam)

# priors
priors <- c(
  prior(normal(0, 5), class = "b"),
  prior(normal(0, 5), class = "Intercept")
)

# fit
fit <- brm(passed_exam ~ parental_income, data = df,
           family = bernoulli(), prior = priors,
           seed = 123, chains = 4, cores = 4, iter = 2000)
Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 15.0.0 (clang-1500.3.9.4)’
using SDK: ‘MacOSX14.4.sdk’
clang -arch arm64 -std=gnu2x -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/arm64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Core:19:
/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
#include <cmath>
         ^~~~~~~
1 error generated.
make: *** [foo.o] Error 1
# summary
summary(fit)
 Family: bernoulli 
  Links: mu = logit 
Formula: passed_exam ~ parental_income 
   Data: df (Number of observations: 200) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept           1.28      1.69    -2.00     4.56 1.00     2279     2473
parental_income     0.04      0.04    -0.03     0.11 1.00     2082     2141

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
0%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.014 seconds (Warm-up)
Chain 1:                0.016 seconds (Sampling)
Chain 1:                0.03 seconds (Total)
Chain 1: 
Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.015 seconds (Warm-up)
Chain 2:                0.012 seconds (Sampling)
Chain 2:                0.027 seconds (Total)
Chain 2: 
Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.015 seconds (Warm-up)
Chain 3:                0.013 seconds (Sampling)
Chain 3:                0.028 seconds (Total)
Chain 3: 
Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.015 seconds (Warm-up)
Chain 4:                0.012 seconds (Sampling)
Chain 4:                0.027 seconds (Total)
Chain 4: 
# convergence
mcmc_trace(fit) 

# pp_check
pp_check(fit)

# plot params
mcmc_areas(fit,
           pars = c("b_parental_income"))

Bayesian Mixed Effect Models

# s/o to Chat GPT for helping simulate the data
# Load required libraries
library(brms)
library(bayesplot)
library(tidybayes)

# sim
set.seed(123)
n_schools <- 10
n_students <- 20
total_n <- n_schools * n_students

school <- factor(rep(1:n_schools, each = n_students))
parental_income <- rnorm(total_n, mean = 50, sd = 10) # Parental income in thousands of dollars
school_effect <- rnorm(n_schools, 0, 5)[school]
individual_error <- rnorm(total_n, 0, 5)
exam_score <- 50 + 0.5 * parental_income + school_effect + individual_error

df <- data.frame(school, parental_income, exam_score)

# priors
priors <- c(
  prior(normal(0, 10), class = "b"),
  prior(normal(50, 10), class = "Intercept"),
  prior(exponential(1), class = "sd")
)

# fit
fit <- brm(exam_score ~ parental_income + (1 + parental_income | school),
           data = df, family = gaussian(), prior = priors,
           seed = 123, chains = 4, cores = 4, iter = 2000)
Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 15.0.0 (clang-1500.3.9.4)’
using SDK: ‘MacOSX14.4.sdk’
clang -arch arm64 -std=gnu2x -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/arm64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Core:19:
/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
#include <cmath>
         ^~~~~~~
1 error generated.
make: *** [foo.o] Error 1
# summary
summary(fit)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: exam_score ~ parental_income + (1 + parental_income | school) 
   Data: df (Number of observations: 200) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~school (Number of levels: 10) 
                               Estimate Est.Error l-95% CI u-95% CI Rhat
sd(Intercept)                      1.07      1.01     0.02     3.82 1.00
sd(parental_income)                0.11      0.04     0.06     0.20 1.01
cor(Intercept,parental_income)     0.05      0.56    -0.93     0.95 1.01
                               Bulk_ESS Tail_ESS
sd(Intercept)                      1598     1122
sd(parental_income)                 659     1090
cor(Intercept,parental_income)      420      847

Regression Coefficients:
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept          51.81      1.94    48.04    55.71 1.00     5579     2443
parental_income     0.48      0.05     0.37     0.57 1.00     1501     1300

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     4.93      0.25     4.46     5.46 1.00     4824     2748

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
1200 / 2000 [ 60%]  (Sampling)
Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.669 seconds (Warm-up)
Chain 4:                0.508 seconds (Sampling)
Chain 4:                1.177 seconds (Total)
Chain 4: 
Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.725 seconds (Warm-up)
Chain 1:                0.555 seconds (Sampling)
Chain 1:                1.28 seconds (Total)
Chain 1: 
Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.834 seconds (Warm-up)
Chain 3:                0.474 seconds (Sampling)
Chain 3:                1.308 seconds (Total)
Chain 3: 
Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.849 seconds (Warm-up)
Chain 2:                0.515 seconds (Sampling)
Chain 2:                1.364 seconds (Total)
Chain 2: 
# converge
mcmc_trace(fit) 

# pp_check
pp_check(fit)

# plot
mcmc_areas(fit, pars = c("b_parental_income"))

Frequentist Regression

Frequentist Regression

Now that we’ve seen how to fit Bayesian regression models, Frequentist models will be a walk in the park! We don’t have to think about priors, or prior predictive checks…!

But we do need to think about how to read the results.

Common Hypothesis Tests

  • t-test

  • z-test

  • chi-square test

Z-test

z-statistic:

\[ z = \frac{\bar{x} - \mu_0}{se} \]

where \(\bar{x}\) is the observed sample statistic, \(\mu_0\) is the sample statistic expected under \(H_0\), and \(se\) is the sample standard error.

z-statistics measure how far away the sample statistic is from the expected \(H_0\) statistic. It’s a z-score using the null distribution \(\mathcal{N}(\mu_0, se)\)


❓when sample statistics are compatible with \(H_0\), what type of z-statistics would you expect to see?

Z-test

Because we divide by \(s\), z-scores are standardized, meaning that we can use the same standard cutoffs to create Neyman-Pearson rejection regions! z-statistics that are more extreme than \(z = \pm 1.645\) account for the 10% most extreme values. For the 5% most extreme we use \(z = \pm 1.96\).

Z-test

In a Fisherian or NHST framework, the z-distribution can also help us calculate p-values. Here \(p = 0.7\).

Z-test

Common z-tests:

  • z-tests for means (known \(\sigma\))

  • z-tests for proportions (believe it or not, \(\sigma\) also known)

Z-test: Example

# is the mean weight of graduate student hydroflasks different from 500g, hydroflasks have a known sd at Chapman of 150g

# h0: mu = 500
# ha: mu != 500

# by hand
(mean(dat) - 500)/(150/sqrt(length(dat)))
[1] 2.176504
# using alpha = 0.1, we reject the null! there is a stat sig difference in the weight of grad student waterbottles

t-test

t-statistic:

\[ t = \frac{\bar{x} - \mu_0}{se} \]

where \(\bar{x}\) is the observed sample statistic, \(\mu_0\) is the sample statistic expected under \(H_0\), and \(se\) is the standard error.

t-statistics also measure how far away the sample statistic is from the expected \(H_0\) statistic.


❓when sample statistics are compatible with \(H_0\), what type of t-statistics would you expect to see?

t-test

Because we divide by \(se\), t-scores are standardized, meaning that we can use the same standard cutoffs (depending on df) to create Neyman-Pearson rejection regions!

In a Fisherian or NHST framework, the z-distribution can also help us calculate p-values.

t-test

Common t-tests:

  • t-tests for mean (unknown \(\sigma\))

  • t-test for regression coefficients

  • t-test for difference between means

t-test: Example

# is the mean weight of bumblebees in florida different than the mean bumblebee weight of 175mg?

t.test(data,
       alternative = "two.sided",
       mu = 175)

    One Sample t-test

data:  data
t = -0.3004, df = 99, p-value = 0.7645
alternative hypothesis: true mean is not equal to 175
95 percent confidence interval:
 172.7395 176.6660
sample estimates:
mean of x 
 174.7028 

t-test: Example

library(TOSTER)

tsum_TOST(m1 = mean(data),
          mu = 175,
          sd1 = sd(data),
          n1 = length(data),
          low_eqbound = 170,
          high_eqbound = 180)

One-sample t-test

The equivalence test was significant, t(99) = 4.753, p = 3.4e-06
The null hypothesis test was non-significant, t(99) = -0.300, p = 7.65e-01
NHST: don't reject null significance hypothesis that the effect is equal to 175 
TOST: reject null equivalence hypothesis

TOST Results 
                 t df p.value
t-test     -0.3004 99   0.765
TOST Lower  4.7530 99 < 0.001
TOST Upper -5.3538 99 < 0.001

Effect Sizes 
           Estimate     SE                 C.I. Conf. Level
Raw         -0.2972 0.9894 [173.0599, 176.3456]         0.9
Hedges's g  17.5226 1.2431   [15.4236, 19.5332]         0.9
Note: SMD confidence intervals are an approximation. See vignette("SMD_calcs").

t-test: example

# is the difference in mean weights of bumblebees in florida and california different than 0?

t.test(x = data1,
       y = data2,
       alternative = "two.sided")

    Welch Two Sample t-test

data:  data1 and data2
t = -4.622, df = 195.18, p-value = 6.898e-06
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -8.118912 -3.262520
sample estimates:
mean of x mean of y 
 174.5142  180.2050 

t-test: Example

    city temperature  rainfall bee_weight
1 City 1    24.64579  933.8643   9.294917
2 City 1    22.91725 1049.5396   3.650119
# fit model
m1 <- lm(bee_weight ~ rainfall + temperature,
         data = simulated_data)

# regression table
summary(m1)

Call:
lm(formula = bee_weight ~ rainfall + temperature, data = simulated_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.0941 -1.3528 -0.0109  1.3792  8.6508 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  5.0597404  0.1251260   40.44   <2e-16 ***
rainfall    -0.0050555  0.0001271  -39.78   <2e-16 ***
temperature  0.2991710  0.0045176   66.22   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.999 on 9997 degrees of freedom
Multiple R-squared:  0.3112,    Adjusted R-squared:  0.311 
F-statistic:  2258 on 2 and 9997 DF,  p-value: < 2.2e-16

t-test: Example

    university      age   group gender caffeine_mg gestures
1 University 3 25.11077 Group B  Woman   151.92202       43
2 University 3 23.72048 Group B  Woman    74.42176       29
# does caffeine intake influence the number of gestures used in conversation?
library(lme4)

# z score
data <- data |> mutate(across(all_of(c("age", "caffeine_mg")),
                              ~ (.-mean(.)) / sd(.)))

m2 <- glmer(gestures ~ age + gender + caffeine_mg + (1 | university),
            data = data,
            family = poisson)

m2 |>summary()
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: gestures ~ age + gender + caffeine_mg + (1 | university)
   Data: data

      AIC       BIC    logLik -2*log(L)  df.resid 
    641.5     657.1    -314.7     629.5        94 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.42780 -0.57690  0.04138  0.61653  2.07566 

Random effects:
 Groups     Name        Variance  Std.Dev.
 university (Intercept) 0.0001464 0.0121  
Number of obs: 100, groups:  university, 5

Fixed effects:
                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)       3.72267    0.03285 113.340   <2e-16 ***
age              -0.02396    0.01694  -1.415   0.1572    
genderNon-binary -0.10289    0.04393  -2.342   0.0192 *  
genderWoman      -0.09640    0.04279  -2.253   0.0243 *  
caffeine_mg       0.04686    0.01654   2.834   0.0046 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) age    gndrN- gndrWm
age         -0.225                     
gndrNn-bnry -0.761  0.250              
genderWoman -0.777  0.237  0.617       
caffeine_mg  0.105  0.029 -0.146 -0.119

Frequentist Regression Output

Loading palmerpenguin data

Regression #1

Regression #1


Call:
lm(formula = body_mass_g ~ flipper_length_mm, data = penguins_clean)

Residuals:
     Min       1Q   Median       3Q      Max 
-1057.33  -259.79   -12.24   242.97  1293.89 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)       -5872.09     310.29  -18.93   <2e-16 ***
flipper_length_mm    50.15       1.54   32.56   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 393.3 on 331 degrees of freedom
Multiple R-squared:  0.7621,    Adjusted R-squared:  0.7614 
F-statistic:  1060 on 1 and 331 DF,  p-value: < 2.2e-16

Regression #2

Regression #2


Call:
glm(formula = sex ~ body_mass_g + flipper_length_mm + bill_length_mm, 
    family = binomial(link = "logit"), data = penguins_clean)

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)        8.2569324  2.7636399   2.988  0.00281 ** 
body_mass_g        0.0029604  0.0004188   7.069 1.56e-12 ***
flipper_length_mm -0.1348614  0.0234043  -5.762 8.30e-09 ***
bill_length_mm     0.1458945  0.0332279   4.391 1.13e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 461.61  on 332  degrees of freedom
Residual deviance: 350.73  on 329  degrees of freedom
AIC: 358.73

Number of Fisher Scoring iterations: 4

Regression #3

Regression #3

Regression #1 brms

Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 15.0.0 (clang-1500.3.9.4)’
using SDK: ‘MacOSX14.4.sdk’
clang -arch arm64 -std=gnu2x -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/arm64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Core:19:
/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
#include <cmath>
         ^~~~~~~
1 error generated.
make: *** [foo.o] Error 1

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 2.1e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.21 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.074 seconds (Warm-up)
Chain 1:                0.008 seconds (Sampling)
Chain 1:                0.082 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 1e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.068 seconds (Warm-up)
Chain 2:                0.007 seconds (Sampling)
Chain 2:                0.075 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 1e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.074 seconds (Warm-up)
Chain 3:                0.008 seconds (Sampling)
Chain 3:                0.082 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 1e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.074 seconds (Warm-up)
Chain 4:                0.007 seconds (Sampling)
Chain 4:                0.081 seconds (Total)
Chain 4: 
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: body_mass_g ~ flipper_length_mm 
   Data: penguins_clean (Number of observations: 333) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept         -5876.93    313.54 -6487.91 -5265.71 1.00     3759     2684
flipper_length_mm    50.18      1.56    47.16    53.19 1.00     3800     2764

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma   394.83     15.39   366.06   426.70 1.00     4058     3059

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Regression #2 brms

Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 15.0.0 (clang-1500.3.9.4)’
using SDK: ‘MacOSX14.4.sdk’
clang -arch arm64 -std=gnu2x -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/arm64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Core:19:
/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
#include <cmath>
         ^~~~~~~
1 error generated.
make: *** [foo.o] Error 1

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 2.8e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.28 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.367 seconds (Warm-up)
Chain 1:                0.146 seconds (Sampling)
Chain 1:                0.513 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 3e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.367 seconds (Warm-up)
Chain 2:                0.172 seconds (Sampling)
Chain 2:                0.539 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 5e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.304 seconds (Warm-up)
Chain 3:                0.167 seconds (Sampling)
Chain 3:                0.471 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 4e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.368 seconds (Warm-up)
Chain 4:                0.167 seconds (Sampling)
Chain 4:                0.535 seconds (Total)
Chain 4: 
 Family: bernoulli 
  Links: mu = logit 
Formula: sex ~ body_mass_g + flipper_length_mm + bill_length_mm 
   Data: penguins_clean (Number of observations: 333) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept             8.34      2.73     3.11    13.77 1.00     2144     2165
body_mass_g           0.00      0.00     0.00     0.00 1.00     2102     2259
flipper_length_mm    -0.14      0.02    -0.18    -0.09 1.00     1624     1864
bill_length_mm        0.15      0.03     0.08     0.21 1.00     1800     1655

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Regression #3 brms

Running /Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB foo.c
using C compiler: ‘Apple clang version 15.0.0 (clang-1500.3.9.4)’
using SDK: ‘MacOSX14.4.sdk’
clang -arch arm64 -std=gnu2x -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG   -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/Rcpp/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/unsupported"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/BH/include" -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/src/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppParallel/include/"  -I"/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/rstan/include" -DEIGEN_NO_DEBUG  -DBOOST_DISABLE_ASSERTS  -DBOOST_PENDING_INTEGER_LOG2_HPP  -DSTAN_THREADS  -DUSE_STANC3 -DSTRICT_R_HEADERS  -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -D_HAS_AUTO_PTR_ETC=0  -include '/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp'  -D_REENTRANT -DRCPP_PARALLEL_USE_TBB=1   -I/opt/R/arm64/include    -fPIC  -falign-functions=64 -Wall -g -O2  -c foo.c -o foo.o
In file included from <built-in>:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/StanHeaders/include/stan/math/prim/fun/Eigen.hpp:22:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Dense:1:
In file included from /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/Core:19:
/Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library/RcppEigen/include/Eigen/src/Core/util/Macros.h:679:10: fatal error: 'cmath' file not found
#include <cmath>
         ^~~~~~~
1 error generated.
make: *** [foo.o] Error 1

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 5.8e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.58 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 1: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 1: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 1: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 1: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 1: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 1: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 1: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 1: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 1: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 1: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 1: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 1.893 seconds (Warm-up)
Chain 1:                0.707 seconds (Sampling)
Chain 1:                2.6 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 1.2e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 2: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 2: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 2: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 2: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 2: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 2: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 2: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 2: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 2: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 2: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 2: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 1.816 seconds (Warm-up)
Chain 2:                0.664 seconds (Sampling)
Chain 2:                2.48 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 1.1e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 3: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 3: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 3: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 3: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 3: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 3: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 3: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 3: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 3: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 3: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 3: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 2.199 seconds (Warm-up)
Chain 3:                0.629 seconds (Sampling)
Chain 3:                2.828 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 1.1e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 2000 [  0%]  (Warmup)
Chain 4: Iteration:  200 / 2000 [ 10%]  (Warmup)
Chain 4: Iteration:  400 / 2000 [ 20%]  (Warmup)
Chain 4: Iteration:  600 / 2000 [ 30%]  (Warmup)
Chain 4: Iteration:  800 / 2000 [ 40%]  (Warmup)
Chain 4: Iteration: 1000 / 2000 [ 50%]  (Warmup)
Chain 4: Iteration: 1001 / 2000 [ 50%]  (Sampling)
Chain 4: Iteration: 1200 / 2000 [ 60%]  (Sampling)
Chain 4: Iteration: 1400 / 2000 [ 70%]  (Sampling)
Chain 4: Iteration: 1600 / 2000 [ 80%]  (Sampling)
Chain 4: Iteration: 1800 / 2000 [ 90%]  (Sampling)
Chain 4: Iteration: 2000 / 2000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 2.043 seconds (Warm-up)
Chain 4:                0.546 seconds (Sampling)
Chain 4:                2.589 seconds (Total)
Chain 4: 
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: body_mass_g ~ flipper_length_mm + (1 | species) + (1 | island) 
   Data: penguins_clean (Number of observations: 333) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~island (Number of levels: 3) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)   143.71    153.74     4.75   603.97 1.00      874      862

~species (Number of levels: 3) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)   385.51    259.46    96.05  1103.21 1.00     1038      563

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept         -4199.97    674.25 -5501.69 -2829.28 1.00     2037     1795
flipper_length_mm    41.63      3.11    35.57    47.91 1.00     2587     2331

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma   374.40     14.93   347.03   405.79 1.00     2924     2316

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Practice on your own

Exercise 1:

  • Come up with a checklist for yourself for building a Bayesian Regression
  • Come up with a checklist for yourself for building a Frequentist Regression

Exercise 2:

Choose whether you want to answer this with a Bayesian or Frequentist framework.

Research Questions: What is the effect of sleep deprivation on Reaction time? Is the effect consistent for all people? or does it vary a lot person to person?

    Reaction Days Subject
1   249.5600    0     308
2   258.7047    1     308
3   250.8006    2     308
4   321.4398    3     308
5   356.8519    4     308
6   414.6901    5     308
7   382.2038    6     308
8   290.1486    7     308
9   430.5853    8     308
10  466.3535    9     308
11  222.7339    0     309
12  205.2658    1     309
13  202.9778    2     309
14  204.7070    3     309
15  207.7161    4     309
16  215.9618    5     309
17  213.6303    6     309
18  217.7272    7     309
19  224.2957    8     309
20  237.3142    9     309
21  199.0539    0     310
22  194.3322    1     310
23  234.3200    2     310
24  232.8416    3     310
25  229.3074    4     310
26  220.4579    5     310
27  235.4208    6     310
28  255.7511    7     310
29  261.0125    8     310
30  247.5153    9     310
31  321.5426    0     330
32  300.4002    1     330
33  283.8565    2     330
34  285.1330    3     330
35  285.7973    4     330
36  297.5855    5     330
37  280.2396    6     330
38  318.2613    7     330
39  305.3495    8     330
40  354.0487    9     330
41  287.6079    0     331
42  285.0000    1     331
43  301.8206    2     331
44  320.1153    3     331
45  316.2773    4     331
46  293.3187    5     331
47  290.0750    6     331
48  334.8177    7     331
49  293.7469    8     331
50  371.5811    9     331
51  234.8606    0     332
52  242.8118    1     332
53  272.9613    2     332
54  309.7688    3     332
55  317.4629    4     332
56  309.9976    5     332
57  454.1619    6     332
58  346.8311    7     332
59  330.3003    8     332
60  253.8644    9     332
61  283.8424    0     333
62  289.5550    1     333
63  276.7693    2     333
64  299.8097    3     333
65  297.1710    4     333
66  338.1665    5     333
67  332.0265    6     333
68  348.8399    7     333
69  333.3600    8     333
70  362.0428    9     333
71  265.4731    0     334
72  276.2012    1     334
73  243.3647    2     334
74  254.6723    3     334
75  279.0244    4     334
76  284.1912    5     334
77  305.5248    6     334
78  331.5229    7     334
79  335.7469    8     334
80  377.2990    9     334
81  241.6083    0     335
82  273.9472    1     335
83  254.4907    2     335
84  270.8021    3     335
85  251.4519    4     335
86  254.6362    5     335
87  245.4523    6     335
88  235.3110    7     335
89  235.7541    8     335
90  237.2466    9     335
91  312.3666    0     337
92  313.8058    1     337
93  291.6112    2     337
94  346.1222    3     337
95  365.7324    4     337
96  391.8385    5     337
97  404.2601    6     337
98  416.6923    7     337
99  455.8643    8     337
100 458.9167    9     337
101 236.1032    0     349
102 230.3167    1     349
103 238.9256    2     349
104 254.9220    3     349
105 250.7103    4     349
106 269.7744    5     349
107 281.5648    6     349
108 308.1020    7     349
109 336.2806    8     349
110 351.6451    9     349
111 256.2968    0     350
112 243.4543    1     350
113 256.2046    2     350
114 255.5271    3     350
115 268.9165    4     350
116 329.7247    5     350
117 379.4445    6     350
118 362.9184    7     350
119 394.4872    8     350
120 389.0527    9     350
121 250.5265    0     351
122 300.0576    1     351
123 269.8939    2     351
124 280.5891    3     351
125 271.8274    4     351
126 304.6336    5     351
127 287.7466    6     351
128 266.5955    7     351
129 321.5418    8     351
130 347.5655    9     351
131 221.6771    0     352
132 298.1939    1     352
133 326.8785    2     352
134 346.8555    3     352
135 348.7402    4     352
136 352.8287    5     352
137 354.4266    6     352
138 360.4326    7     352
139 375.6406    8     352
140 388.5417    9     352
141 271.9235    0     369
142 268.4369    1     369
143 257.2424    2     369
144 277.6566    3     369
145 314.8222    4     369
146 317.2135    5     369
147 298.1353    6     369
148 348.1229    7     369
149 340.2800    8     369
150 366.5131    9     369
151 225.2640    0     370
152 234.5235    1     370
153 238.9008    2     370
154 240.4730    3     370
155 267.5373    4     370
156 344.1937    5     370
157 281.1481    6     370
158 347.5855    7     370
159 365.1630    8     370
160 372.2288    9     370
161 269.8804    0     371
162 272.4428    1     371
163 277.8989    2     371
164 281.7895    3     371
165 279.1705    4     371
166 284.5120    5     371
167 259.2658    6     371
168 304.6306    7     371
169 350.7807    8     371
170 369.4692    9     371
171 269.4117    0     372
172 273.4740    1     372
173 297.5968    2     372
174 310.6316    3     372
175 287.1726    4     372
176 329.6076    5     372
177 334.4818    6     372
178 343.2199    7     372
179 369.1417    8     372
180 364.1236    9     372

\(\chi^2\)-test

\(\chi^2\)-test

\(\chi^2\)-statistic:

\[ \chi^2 = \frac{(o-e)^2}{e}\\ \underbrace{\chi = \frac{(o-e)}{\sqrt{e}}}_\text{for demo purposes only} \]

where \(o\) is the observed sample statistic, \(e\) is the sample statistic expected under \(H_0\)

\(\chi^2\)-statistics also measure how far away the sample statistic is from the expected \(H_0\) statistic.


❓when sample statistics are compatible with \(H_0\), what type of \(\chi^2\)-statistics would you expect to see?

\(\chi^2\)-test

Common \(\chi^2\) tests:

  • Homogeneity: do two groups have the same distribution for one variable in the dataset?

  • Independence: are two variables in a dataset related?

  • Goodness of Fit: does one variable in the dataset match our expected distribution?

\(\chi^2\)-test

I am satisfied with the quality of my healthcare

Strong Agree Agree Neither Disagree Strong Disagree
Blue Shield 15 10 40 20 15

Test of GOF: does the distribution of responses match our goal distribution of 20/25/30/15/10%?

\(\chi^2\)-test

Strong Agree Agree Neither Disagree Strong Disagree
Blue Shield 15 10 40 20 15
EXPECTED Strong Agree Agree Neither Disagree Strong Disagree
Blue Shield 100 * 0.2 100 * 0.25 100 * 0.3 100*0.15 100*0.1

\(\chi^2\)-statistic: \(\chi^2 = \frac{(o-e)^2}{e}\)

\(\chi^2\)-test

chisq.test(x = c(15,10,40,20,15),
           p = c(0.2,0.25,0.3,0.15,0.1))

    Chi-squared test for given probabilities

data:  c(15, 10, 40, 20, 15)
X-squared = 17.75, df = 4, p-value = 0.001381

\(\chi^2\)-test

I am satisfied with the quality of my healthcare

Strong Agree Agree Neither Disagree Strong Disagree
Blue Shield 15 10 40 20 15
United Healthcare 10 5 20 40 25

Test of Homogeneity: Is the distribution of responses different for the two groups?

\(\chi^2\)-test

summary_data <- rbind(bs = c(15,10,40,20,15),
                      uh = c(10,5,20,40,25))
chisq.test(summary_data)

    Pearson's Chi-squared test

data:  summary_data
X-squared = 18.5, df = 4, p-value = 0.0009851

\(\chi^2\)-test: Example

# is the proportion of heads different than 0.5?? Do we have a fair coin?

prop.test(mean(dat), # prop heads
          length(dat), # num flips
          p = 0.5, alternative = "two.sided")

    1-sample proportions test with continuity correction

data:  mean(dat) out of length(dat), null probability 0.5
X-squared = 95.688, df = 1, p-value < 2.2e-16
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 2.015243e-05 5.594179e-02
sample estimates:
     p 
0.0059 

\(\chi^2\)-test: Example

# Set seed for reproducibility
set.seed(42)

# Define the expected color distribution (e.g., based on a typical Skittles bag)
expected_proportions <- c(red = 0.2, orange = 0.2, yellow = 0.2, green = 0.2, purple = 0.2)

# Simulate the actual observed counts of Skittles in a bag (e.g., a bag of 100 Skittles)
total_skittles <- 100
observed_counts <- rmultinom(1, total_skittles, prob = expected_proportions)[,1]

# print
print("Observed counts:")
[1] "Observed counts:"
print(observed_counts)
   red orange yellow  green purple 
    26     24     15     20     15 
# test
expected_counts <- total_skittles * expected_proportions
chisq_test <- chisq.test(x = observed_counts, p = expected_proportions)
print("Chi-square GOF test result:")
[1] "Chi-square GOF test result:"
print(chisq_test)

    Chi-squared test for given probabilities

data:  observed_counts
X-squared = 5.1, df = 4, p-value = 0.2772

\(\chi^2\)-test: Example

# Set seed for reproducibility
set.seed(123)

# Define the number of students in each group
n_eecs <- 100
n_cads <- 100

# Define the expected distributions of undergraduate majors 
prob_eecs <- c(math = 0.3, business = 0.1, humanities = 0.05, 
               computer_science = 0.4, data_science = 0.15)
prob_cads <- c(math = 0.2, business = 0.25, humanities = 0.1, 
               computer_science = 0.25, data_science = 0.2)

# Simulate data using multinomial distribution
observed_eecs <- rmultinom(1, n_eecs, prob = prob_eecs)[,1]
observed_cads <- rmultinom(1, n_cads, prob = prob_cads)[,1]

# Combine data into a table for the chi-square test
observed_data <- rbind(EECS = observed_eecs, CADS = observed_cads)
colnames(observed_data) <- c("Math", "Business", "Humanities", 
                             "Computer Science", "Data Science")

# print
print("Observed data for EECS and CADS students:")
[1] "Observed data for EECS and CADS students:"
print(observed_data)
     Math Business Humanities Computer Science Data Science
EECS   30        9          8               33           20
CADS   13       27         15               25           20
# test
chisq_test <- chisq.test(observed_data)
print("Chi-square test of homogeneity result:")
[1] "Chi-square test of homogeneity result:"
print(chisq_test)

    Pearson's Chi-squared test

data:  observed_data
X-squared = 18.955, df = 4, p-value = 0.0008022